cient Algorithms for Identifying Relevant Features
نویسنده
چکیده
This paper describes eecient methods for exact and approximate implementation of the MIN-FEATURES bias, which prefers consistent hypotheses deenable over as few features as possible. This bias is useful for learning domains where many irrelevant features are present in the training data. We rst introduce FOCUS-2, a new algorithm that exactly implements the MIN-FEATURES bias. This algorithm is empirically shown to be substantially faster than the FOCUS algorithm previously given in Al-muallim and Dietterich, 1991]. We then introduce the Mutual-Information-Greedy, Simple-Greedy and Weighted-Greedy algorithms, which apply eecient heuristics for approximating the MIN-FEATURES bias. These algorithms employ greedy heuristics that trade op-timality for computational eeciency. Experimental studies show that the learning performance of ID3 is greatly improved when these algorithms are used to preprocess the training data by eliminating the irrelevant features from ID3's consideration. In particular, the Weighted-Greedy algorithm provides an excellent and eecient approximation of the MIN-FEATURES bias.
منابع مشابه
Learning Boolean Concepts in the Presence of Many Irrelevant Features
In many domains, an appropriate inductive bias is the MIN-FEATURES bias, which prefers consistent hypotheses de nable over as few features as possible. This paper de nes and studies this bias in Boolean domains. First, it is shown that any learning algorithm implementing the MIN-FEATURES bias requires (1 ln 1 + 1 [2p + p lnn]) training examples to guarantee PAC-learning a concept having p relev...
متن کاملCall Attention to Rumors: Deep Attention Based Recurrent Neural Networks for Early Rumor Detection
e proliferation of social media in communication and information dissemination has made it an ideal platform for spreading rumors. Automatically debunking rumors at their stage of diusion is known as early rumor detection, which refers to dealing with sequential posts regarding disputed factual claims with certain variations and highly textual duplication over time. us, identifying trending ...
متن کاملA Model for Detecting of Persian Rumors based on the Analysis of Contextual Features in the Content of Social Networks
The rumor is a collective attempt to interpret a vague but attractive situation by using the power of words. Therefore, identifying the rumor language can be helpful in identifying it. The previous research has focused more on the contextual information to reply tweets and less on the content features of the original rumor to address the rumor detection problem. Most of the studies have been in...
متن کاملClassification of encrypted traffic for applications based on statistical features
Traffic classification plays an important role in many aspects of network management such as identifying type of the transferred data, detection of malware applications, applying policies to restrict network accesses and so on. Basic methods in this field were using some obvious traffic features like port number and protocol type to classify the traffic type. However, recent changes in applicat...
متن کاملNovel Randomized Feature Selection Algorithms
Feature selection is the problem of identifying a subset of the most relevant features in the context of model construction. This problem has been well studied and plays a vital role in machine learning. In this paper we present three randomized algorithms for feature selection. They are generic in nature and can be applied for any learning algorithm. Proposed algorithms can be thought of as a ...
متن کامل